A new code transformation technique for nested loops
نویسندگان
چکیده
For good performance of every computer program, good cache utilization is crucial. In numerical linear algebra libraries, good cache utilization is achieved by explicit loop restructuring (mainly loop blocking), but it requires a complicated memory pattern behavior analysis. In this paper, we describe a new source code transformation called dynamic loop reversal that can increase temporal and spatial locality. We also describe a formal method for predicting cache behavior and evaluate results of the model accuracy by the measurements on a cache monitor. The comparisons of the numbers of measured cache misses and the numbers of cache misses estimated by the model indicate that the model is relatively accurate and can be used in practice.
منابع مشابه
Timing and Code Size Optimization on Achieving Full Parallelism in Uniform Nested Loops
Multidimensional Retiming is one of the most important optimization techniques to improve timing parameters of nested loops. It consists in exploring the iterative and recursive structures of loops to redistribute computation nodes on cycle periods, and thus to achieve full parallelism. However, this technique introduces a large overhead in a loop generation due to the loop transformation. The ...
متن کاملTowards Unimodular Transformations for Non-perfectly Nested Loops
In this paper we discuss a possibility to extend unimodular transformations to non-perfectly nested loops. The main idea behind this extension is to convert a non-perfectly nested loop into a perfectly nested one by moving code into to innermost loop and properly guarding it to avoid multiple execution. This form of the loop can be viewed as an intermediate form for the transformation. Having o...
متن کاملOptimizing and Parallelizing Loops in Object-Oriented Database Programming Languages
Database programming languages like O2, E, and O++ include the ability to iterate through a set. Nested itera-tors can be used to express joins. Without program analysis, such joins must be evaluated using a tuple-at-a-timenested-loops join algorithm, because otherwise program semantics may be violated. Ensuring that the program’ssemantics are preserved during transformation require...
متن کاملAn Efficient Code Generation Technique for Tiled Iteration Spaces
This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex comp...
متن کاملOn Parameterized Tiled Loop Generation and Its Parallelization
Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has proved to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell. Data locality and parallelism w...
متن کاملLoop Distribution and Fusion with Timing and Code Size Optimization for Embedded DSPs
Loop distribution and loop fusion are two effective loop transformation techniques to optimize the execution of the programs in DSP applications. In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size. We first develop the loop distribution theorems that state the legality condit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Sci. Inf. Syst.
دوره 11 شماره
صفحات -
تاریخ انتشار 2014